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Abstract 

We  propose  a  new  style  of  operating  system  architecture  appropriate  for  microkernel-based  operating  sys¬ 
tems:  services  are  implemented  as  a  combination  of  shared  libraries  and  dedicated  server  processes.  Shared 
libraries  implement  performance  critical  portions  of  each  system  service,  while  dedicated  servers  implement 
the  parts  of  each  service  that  do  not  require  high  performance  or  that  are  difficult  to  implement  in  an  appli¬ 
cation.  Our  initial  experiments  show  that  this  approach  to  operating  system  structure  can  yield  performance 
that  is  comparable  to  monolithic  kernel  systems  while  retaining  all  the  modularity  advantages  of  microkernel 
technology.  Since  services  reside  in  libraries,  an  application  is  free  to  use  the  library  that  is  most  appropri¬ 
ate.  This  approach  can  even  yield  better  performance  than  monolithic  kernel  systems  by  allowing  the  shared 
libraries  to  be  closely  coupled  with  the  applications,  thereby  exploiting  application-specific  knowledge  in 
policy  decisions. 
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1.  Introduction 


In  the  past  few  years,  there  has  been  dramatic  growth  in  the  number  and  quality  of  microkernels.  As 
of  this  writing,  several  commercial  operating  systems  based  on  microkernel  technology  exist  or  are  under 
development  [Phelan  et  ed.  93,  Hildebrand  92,  Rozier  et  jd.  92,  Zajcew  et  al.  93].  Current  practice  is  to 
structure  a  microkernel  operating  system  as  one  or  more  server  processes  that  collectively  implement  the 
operating  system  services  [Golub  et  al.  90,  Julin  et  al.  91,  Rozier  et  al.  92,  Kh2didi  ic  Nelson  93,  Hildebrand 

92) .  This  approach  implicitly  models  the  operating  system  as  a  distributed  system  where  services  reside  in 
“remote”  server  processes  that  happen  to  be  on  the  same  machine.  However,  the  communication  overhead 
incurred  when  contacting  these  servers  can  result  in  poor  performance.  In  latency-sensitive  applications  like 
those  requiring  high-speed  networking,  the  extra  communication  costs  for  applications  to  communicate  with 
“remote”  server  processes  is  unacceptable  because  the  latency  for  an  interprocess  RPC  is  comparable  to  the 
network  round-trip  latency  [Brustoloni  k.  Bershad  93,  Oraves  et  al.  91). 

In  this  position  paper,  we  propose  a  new  style  of  operating  system  architecture  appropriate  for  microkernel- 
based  operating  systems:  services  are  implemented  as  a  combination  of  shared  libraries  and  dedicated  server 
processes.  Shared  libraries  implement  performance  critical  portions  of  each  system  service,  while  dedicated 
servers  implement  the  parts  of  each  service  that  do  not  require  high  performance  or  that  are  difficult  to 
implement  in  an  application.  Dedicated  servers  might  be  used,  for  example,  to  manage  shared  state  that 
must  persist  across  process  lifetimes  or  to  implement  high-level  abstractions  that  are  difficult  or  impossible 
to  provide  in  a  library. 

Our  initial  experiments  show  that  this  approach  to  operating  system  structure  can  yield  performance  that 
is  comparable  to  monolithic  kernel  systems  while  retaining  ail  the  modularity  advantages  that  led  industry 
to  adopt  microkernel  technology  in  the  first  place.  Since  services  reside  in  libraries,  an  application  is  free  to 
use  the  library  that  is  most  appropriate.  This  approach  can  even  yield  better  performance  than  monolithic 
kernel  systems  by  allowing  the  shared  libraries  to  be  closely  coupled  with  the  applications,  thereby  exploiting 
application-specific  knowledge  in  policy  decisions. 

In  the  next  section,  we  present  our  approach  to  structuring  system  services  and  discuss  the  role  of  the 
kernel,  servers,  and  application-level  libraries.  In  Section  3  we  describe  how  we  applied  our  approach  to  the 
implementation  of  a  networking  service  (a  more  complete  description  can  be  found  in  [Maeda  k  Bershad 

93] ).  In  Section  4  we  suggest  how  our  approach  may  be  applied  to  a  filesystem  service,  and  discuss  the 
potential  benefits  of  doing  so.  We  summarize  in  Section  5. 


2.  Service  structure 


Operating  systems  generally  perform  two  functions:  they  allocate  machine  resources,  such  as  physical  mem¬ 
ory,  processors,  and  I/O  capacity,  and  they  provide  high-level  abstractions  like  filesystems,  processes,  and 
I/O  channels.  In  our  model,  the  kernel  is  simply  a  global  resource  scheduler.  Services  external  to  the  kernel 
provide  abstractions  and  a  means  for  applications  to  acquire  and  use  resources.  The  implementation  of  a  ser¬ 
vice  is  split  into  global  and  application-specific  parts  that  reside  in  servers  and  shared  libraries,  respectively. 
By  splitting  the  implementation  in  this  way,  applications  can  make  more  effective  use  of  their  resources 
without  forfeiting  security  or  generality. 

Services  acquire  resources  from  the  kernel  and  manage  them  using  a  global  server  or  application-specific 
libraries.  Global  servers  can  be  used  to  manage  shared  abstractions  or  large  blocks  of  resources  delegated 
by  the  kernel,  such  as  a  disk  partition.  The  application  specific  libraries  provide  information  to  the  kernel 
about  special  resource  requirements  and  manage  resources  dedicated  to  the  application,  such  as  a  single 
network  connection.  The  kernel  in  such  a  system  has  the  following  requirements: 

•  a  global  resource  scheduling  policy.  H.  •— 
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•  adequate  means  to  enforce  resource  scheduling  decisions  (protection). 

•  a  way  to  inform  services  about  resource  scheduling  decisions. 

•  a  way  for  services  to  provide  hints  to  the  kernel  to  influence  resource  scheduling  decisions. 


There  are  two  performance  benefits  to  our  model.  The  first  is  that  the  shared  library  can  exploit 
application-specific  knowledge  in  managing  resources  allocated  by  the  kernel.  The  second  benefit  of  our 
model  is  that  performance-critical  parts  of  the  service  can  be  implemented  in  the  shared  library,  thereby 
avoiding  the  extra  communication  overhead  associated  with  a  remote  server  process. 


Some  examples 

The  operating  systems  community  has  been  “dabbling”  with  the  resource  manager  model  for  some  time 
now.  Early  personal  computer  operating  systems  [Redeli  et  al.  80.  Moon  91]  ran  all  system  services  and 
applications  in  a  single  address  space,  which  enabled  applications  to  be  tightly  coupled  with  the  operating 
system.  However,  these  systems  provided  no  protection  against  rogue  or  buggy  applications  that  crash  the 
system  or  use  the  hardware  to  attack  other  systems. 

Other  work  spans  file  systems  [Rees  et  al.  86,  Bershad  ic  Pinkerton  88],  scheduling  [Anderson  et  al.  92], 
communication  [Bershad  et  al.  91],  and  user-level  memory  management  [McNamee  Armstrong  90,  Harty  L 
Cheriton  92,  Sechrest  k  Park  91,  Krueger  et  al.  93].  The  work  in  extensible  filesystems  permits  applications 
to  extend  the  semantics  of  files  on  a  per-file  basis.  However,  this  work  still  leaves  all  resource  scheduling 
decisions  to  the  operating  system.  With  scheduler  activations,  the  kernel  globally  allocates  processors  to 
applications  and  informs  them  when  their  processor  allocation  changes.  The  applications  provide  hints 
about  when  changes  in  processor  allocation  would  be  useful  and  use  the  processor  allocation  information 
to  implement  a  high-level  threads  library.  URPC  is  an  IPC  library  that  uses  shared  memory  to  implement 
low-latency  IPC.  The  library  relies  on  the  kernel’s  scheduler  to  perform  processor  allocation  in  response  to 
outstanding  messages.  Implementations  of  user-level  memory  management  permit  applications  to  determine 
the  page  replacement  policy  for  their  virtual  memory.  The  kernel  allocates  physical  memory  to  applications 
while  the  applications  determine  virtual  to  physical  mappings.  (In  contrast,  the  Mach  External  Pager  [Young 
89]  simply  enables  applications  to  implement  backing  store  for  parts  of  their  virtual  memory.) 


3.  A  networking  service 


In  this  section,  we  describe  an  implementation  of  networking  protocols  that  runs  as  a  library-level  service, 
and  that  relies  on  a  central  server  for  a  few  critical  operations.  In  our  networking  service,  a  library  im¬ 
plements  a  complete  TCP/IP  and  UDP/IP  stack  that  communicates  directly  with  an  in-kernel  Ethernet 
device  driver.  Incoming  packets  are  demultiplexed  to  application  address  spaces  using  the  packet  filter,  a 
protocol-independent  packet  demultiplexing  facility  [Mogul  et  al.  87,  Yuhara  et  ai.  94].  The  library  takes 
raw  Ethernet  packets  from  the  kernel  and  does  all  protocol  stack  processing  before  handing  the  data  to  the 
application.  Similarly,  outgoing  data  is  formatted  into  TCP/IP  or  UDP/IP  messages  before  being  sent  to 
the  in-kernel  Ethernet  driver.  The  proper  level  of  distrust  is  maintained  because  the  packet  filter  ensures 
that  applications  only  see  the  packets  that  they  are  supposed  to  see.  A  similar  mechanism  could  be  applied 
to  outgoing  packets  to  ensure  that  applications  only  send  packets  that  they  are  allowed  to  send. 

The  library  is  linked  with  applications  and  cooperates  with  a  server  process  to  manage  host-level  state 
such  as  routing  tables  and  to  support  the  BSD  Sockets  API  (see  Figure  1).  The  Sockets  API  is  difficult  to 
emulate  in  an  application-level  library  because  network  sessions  are  represented  as  file  descriptors  which  have 
complex  semantics  due  to  system  calls  such  as  fork  and  salsct.  Two  techniques,  session  state  migration  and 
co-management  of  abstractions,  are  instrumental  in  emulating  the  complex  semantics  of  Unix  file  descriptors. 
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Application 


Figure  1:  Schematic  of  a  networking  service  implementation  where  critical-path  functionality  is  implemented 
by  libraries  in  the  application’s  address  space.  A  server  process  manages  shared  databases,  handles  connec¬ 
tion  set  up,  and  implements  high-level  abstractions. 


Session  state  migration  is  used  to  set  up  sessions  and  to  enable  sharing.  The  session  state  for  a  TCP 
connection,  for  example,  consists  of  the  connection  state  variables  ([Postel  81]),  and  any  data  buffered  by  the 
protocol  stack.  The  server  process  hamdles  the  TCP  connection  setup  protocol  and  migrates  the  established 
TCP  session  to  the  application  during  an  accept  call.  The  protocol  stack  in  the  application  and  the  in¬ 
kernel  device  driver  handle  all  subsequent  data  transfer;  no  interaction  with  the  server  is  required.  When 
an  application  does  a  fork  system  call,  the  parent  and  child  processes  of  the  fork  must  each  have  a  file 
descriptor  that  refers  to  the  same  network  session.  Before  the  fork,  the  network  session  state  is  migrated 
back  to  the  server  process  so  that  it  may  be  shared. 

Co-management  of  abstractions  is  used  to  emulate  the  select  system  call.  The  operating  system  knows 
about  file  descriptors  that  are  managed  by  applications  and  exports  2tn  interface  by  which  the  applications 
can  inform  the  operating  system  when  the  status  of  a  file  descriptor  changes.  When  the  library  learns  that 
a  file  descriptor  has  changed  status,  it  informs  the  operating  system  which  forces  any  blocked  select  calls 
to  return. 

The  global  resource  managed  by  the  kernel  is  network  cap^M:ity.  When  an  application  acquires  a  packet 
filter  port,  it  acquires  a  portion  of  network  capacity  on  which  it  can  apply  a  protocol.  In  the  current 
implementation,  this  resource  allocation  aspect  is  implicit;  we  assume  that  Ethernet  bandwidth  is  infinite. 
However,  the  kernel  could  explicitly  allocate  network  resources  if  applications  specified  a  quality  of  service 
(QOS)  [Kurose  93]  at  session  establishment  time,  and  if  the  kernel  enforced  QOS  by  penalizing  applications 
that  exceed  their  limits. 

The  performance  of  our  system  is  comparable  to  native  in-kernel  implementations.  Between  two  DEC- 
station  5000/200’s  running  in  single-user  mode  on  a  private  Ethernet,  the  round-trip  latency  for  1  byte  UDP 
messages  is  1.50  ms  in  our  system  and  1.45  ms  for  the  Mach  2.5  integrated  kernel.  Round-trip  latency  for  1 
byte  TCP  messages  is  1.40  ms  in  Mach  2.5  and  1.75  ms  in  our  system.  For  TCP  throughput,  the  Mach  2.5 
kernel  achieves  1070  kilobytes/second  while  our  system  achieves  995  kilobytes/second.  Our  library  lags  a 
few  percent  behind  2.5  because  we  copy  incoming  packets  (including  protocol  headers)  one  extra  time  in  our 
system.  In  contrast,  both  implementations  substantially  outperform  the  Mach  3.0  Unix  Server,  where  the 
round-trip  latency  for  1  byte  messages  is  3.61  ms  for  UDP,  3.64  ms  for  TCP,  and  where  TCP  throughput  is 
740  kilobytes/second. 

Once  we  have  a  user-level  implementation,  we  can  achieve  further  performance  gains  by  more  tightly 
coupling  the  application -with  the  protocol  implementation.  For  example,  we  csui  change  the  API  to  return  a 
buffer  from  a  socket  call,  rather  than  filling  one  in,  eliminating  two  data  copies  during  round-trip  operp^'ons. 
UDP  round-trip  latency  drops  to  1.46  ms  and  TCP  latency  drops  to  1.72  ms.  TCP  throughput  only  improves 
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by  1%  (to  1002  kilobytes/second)  because  the  copies  eliminated  by  this  change  are  not  the  critical  path  for 
high  throughput. 


4.  A  proposed  filesystem  service 


At  first  glance,  it  is  not  clear  how  application-specific  libraries  can  provide  any  performance  improvement  for 
filesystems.  Since  many  applications  share  the  same  file  system,  the  filesystem  metadata  should  be  managed 
by  a  single  server  to  which  the  kernel  has  delegated  responsibility  for  the  disk  partition.  Furthermore,  disks 
are  slow  enough  compared  to  the  cost  of  an  interprocess  RPC  that  there  is  little  a^lditional  overhead  involved 
with  accessing  data  through  a  dedicated  file  server  process  [Bershad  92].  However,  given  current  trends  in 
CPU,  memory  system,  and  disk  performance,  applications  will  require  a  fast  path  to  a  disk  block  cache 
which  rarely  misses  in  order  to  perform  well.  By  ‘Yast,”  we  mean  that  there  isn’t  time  to  even  copy  the 
data  out  of  the  buffer  cache  into  the  client’s  address  space  [Ousterhout  90],  let  alone  fetch  it  from  disk. 
Consequently,  the  data  must  effectively  be  cached  in  the  client’s  address  space  before  it  is  accessed. 

One  way,  then,  to  apply  our  resource  management  methodology  to  a  file  system  service  is  to  have  a  server 
that  manages  the  disk,  along  with  a  shared  library  that  implements  a  distributed  buffer  cache.  Instead  of 
having  a  buffer  cache  that  resides  in  the  address  space  of  a  single  server,  the  buffer  cache  pages  can  be 
mapped  into  the  address  space  of  the  application  that  most  likely  to  use  them.  This  approach  means  that 
an  application  can  access  data  in  the  buffer  cache  with  a  simple  procedure  call  and  without  having  to  copy 
the  data  out  of  the  cache. 

While  this  approach  uses  the  same  mechanism  as  memory-mapped  files,  the  policy  is  completely  different. 
Memory-mapped  files  use  a  reactive  policy  where  file  data  is  not  brought  in  from  disk  and  mapped  into  the 
application’s  virtual  address  space  until  it  is  referenced,  forcing  the  application  to  wait.  Future  applications, 
such  as  those  in  databases,  multimedia,  and  scientific  computing,  will  require  a  more  proactive  policy  where 
data  is  present  in  physical  memory  before  it  is  first  referenced. 

Filesystem  performance  can  benefit  from  application-specific  information  in  two  ways.  The  application 
can  provide  hints  about  future  usage  to  the  filesystem  server  to  help  it  schedule  disk  traffic  [Patterson  et  al. 
93].  This  will  result  in  more  effective  prefetching  policies  and  lower  buffer  cache  miss  rates.  An  effective 
prefetching  policy  will  also  move  virtual  memory  remapping  operations  off  the  critical  path  since  disk  blocks 
will  dready  be  mapped  into  the  application  address  space  when  they  are  needed.  In  addition,  the  application 
can  tell  the  kernel  about  how  it  will  use  the  buffer  cache  so  that  the  kernel  can  make  informed  decisions 
about  physical  memory  allocation  [Stonebraker  81]. 


5.  Summary 


One  way  for  microkernel  operating  systems  to  match  the  performance  of  integrated  kernel  systems  is  to 
implement  services  as  a  combination  of  application-level  shared  libraries  and  dedicated  server  processes.  We 
have  implemented  a  networking  service  in  this  “distributed”  fashion  that  matches  the  performance  of  an 
integrated  kernel  and  we  have  demonstrated  that  tighter  coupling  between  the  application  and  the  library 
can  result  in  even  better  performance.  We  expect  that  other  operating  system  services  can  be  designed  this 
way;  library-based  implementations  of  scheduling,  IPC,  and  memory  management  have  been  reported  in  the 
literature,  and  we  have  described  a  way  that  this  approach  could  be  applied  to  file  systems. 

An  open  problem  is  how  to  design  an  infrastructure  that  enables  “distributed”  implementations  of  oper¬ 
ating  system  services.  The  most  important  part  of  this  infrastructure  is  a  kernel  that  informs  applications 
about  its  resource  scheduling  decisions  and  that  can  use  hints  about  application  requirements  in  its  resource 
scheduling  -decisions. 
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